Understanding of Complaints and Praises of Woohoo Gift card – Google Reviews

BA07_Capstone Project Report_HARSHA GV

Data Cleaning / EDA

There are NO missing data

DATA Cleaning

Step 1 - convert to lower case

Step 2 - Removes square bracket, removes numbers and punctuation

from wordcloud import WordCloud, STOPWORDS

Visualize the world cloud

Concatenate all the words in data to form a string

str_data = " " data_dump = df['cleaned_description_new'] for record in data_dump: str_data = str_data + " " + record

wordcloud = WordCloud( stopwords=STOPWORDS, background_color='white', width=1200, height=1000 ).generate(str_data)

plt.imshow(wordcloud) plt.axis('off') plt.show()

Step 3 - Tokenization

Create the tokens in a seperate column in the dataframe

Step 4 - Removing Stopwords

We are appending the output to the actual dataset

Step 5- Stemming and Lemmantization

First we will run stema ,

Second we will run Lemmatize.

Use the same code . First run Stema then disable it and then run Lemma

Stemming

Lemmatization

Downloading the data to review the data

Uploading the file after converting the List to string"preprocessed_docs" and assinged new column "Word_final".

Checking for the Distribution of Default ### import matplotlib.pyplot as plt %matplotlib inline print('Percentage for default\n') print(round(df_word.Sentiment.value_counts(normalize=True)100,2)) round(df.Sentiment.value_counts(normalize=True)100,2).plot(kind='bar') plt.title('Percentage Distributions by review type') plt.show()

from wordcloud import WordCloud, STOPWORDS

Visualize the world cloud

Concatenate all the words in data to form a string

str_data = " " data_dump = df_word['word_final'] for record in data_dump: str_data = str_data + " " + record

wordcloud = WordCloud( stopwords=STOPWORDS, background_color='white', width=2400, height=2000 ).generate(str_data)

plt.imshow(wordcloud) plt.axis('off') plt.show()

Train Test Split

Lexicon Sentiment analysis

SENTIMENT ANALYSIS DEFINITION

In sentiment analysis we classify the polarity of given text at document ,sentence or feature level.It tells us but the opinion of it whether is positive , negative or neutral. If we go more advance like beyond polarity we can go for emotional states like angry , sad and happy.

AFFIN Analysis -

On the cleaned data , uisng the column cleaned_description_new.

NRC Lexicon Sentiment Analysis

NRC Word-Emotion Association Lexicon

ADDING Positive

ADDING Negative

ADDING Anger

ADDING Anticipation

ADDING Disgust

ADDING Fear

ADDING Joy

ADDING Sadness

ADDING Surprise

ADDING trust

What is VADER ?

VADER stands for Valence Aware Dictionary and sEntiment Reasoner. It is a rule-based sentiment analyzer.It consists of a list of lexical features (e.g. words) which are generally labeled as per their semantic orientation as positive or negative.

Please install VaderSentiment, if doing for the firstt time

pd.crosstab(train_v_table['Rating'], train_v_table['VSentiment'])

pd.crosstab(train_v_table.Rating, train_v_table.VSentiment).apply(lambda r: r/len(df), axis=1)

result=train_v_table['VSentiment'].value_counts() result.plot(kind='bar' , rot=0, color=['plum', 'cyan'])

Word Clouds

NRC _EMOTION Word Cloud

Naïve Bayesian model

naive Bayesian - Bag of Words (BoW) using CountVectoriser

Naive_bayes - TFIDF¶

THE END